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Manufacturing Test: for a 
Fault Tolerant Magnetoresistive 
Solid-state Storage Device 

This is a continuation-in-part application of co- 
pending U.S. Patent Application No. 09/915,179, filed on 
25 July 2001, which is hereby incorporated by reference. 

The present invention relates in general to a 
magnetoresistive solid-state storage device and to a 
method for testing a magnetoresistive solid-state storage 
device. In particular, but not exclusively, the invention 
relates to a method for testing a magnetoresistive solid- 
state storage device that in use will employ error 
correction coding (ECC) . 

A typical solid-state storage device comprises one or 
more arrays of storage cells for storing data. Existing 
semiconductor technologies provide volatile solid-state 
storage devices suitable for relatively short term storage 
of data, such as dynamic random access memory (DRAM) , or 
devices for relatively longer term storage of data such as 
static random access memory (SRAM) or non-volatile flash 
and EEPROM devices. However, many other technologies are 
known or are being developed. 

Recently, a magnetoresistive storage device has been 
developed as a new type of non-volatile solid-state 
storage device (see, for example, EP-A-0918334 Hewlett- 
Packard) . The magnetoresistive solid-state storage device 
is also known as a magnetic random access memory (MRAM) 
device. MRAM devices have relatively low power consumption 
and relatively fast access times, particularly for data 
write operations, which renders MRAM devices ideally 
suitable for both short term and long term storage 
applications . 



2 



A problem arises in that MRAM devices are subject to 
physical failure, which can result in an unacceptable loss 
of stored data. Currently available manufacturing 

techniques for MRAM devices are subject to limitations and 
5 as a result manufacturing yields of commercially 
acceptable MRAM devices are relatively low. Although 
better manufacturing techniques are being developed, these 
tend to increase manufacturing complexity and cost. Hence, 
it is desired to apply lower cost manufacturing techniques 

10 whilst increasing device yield. Further, it is desired to 
increase cell density formed on a substrate such as 
silicon, but as the density increases manufacturing 
tolerances become increasingly difficult to control, again 
leading to higher failure rates and lower device yields. 

15 Since the MRAM devices are at a relatively early stage in 
development, it is desired to allow large scale 
manufacturing of commercially acceptable devices, whilst 
tolerating the limitations of current manufacturing 
techniques . 

20 

An aim of the present invention is to provide a method 
for testing a magnetoresistive solid-state storage device. 
A preferred aim is to provide a test which may be employed 
at manufacture of a device, preferably prior to storage of 
25 active user data. 

According to a first aspect of the present invention 
there is provided a method for testing a magnetoresistive 
solid-state storage device, the method comprising the 
30 steps of: accessing a set of magnetoresistive storage 
cells, the set being arranged in use to store at least one 
block of ECC encoded data; and determining whether the 



PDNO (3001 1265 "B") - Final Draft - 12 September 2001 Appleyard Lees - V88 



3 



accessed set of storage cells is suitable for, in use, 
storing at least one block of ECC encoded data. 

Preferably, the method comprises determining whether 
5 original information is expected to be unrecoverable, if a 
block of ECC encoded data were to be stored in the 
accessed set of storage cells. In particular, it is 
determined whether original information is expected to be 
unrecoverable because the probability that original 

10 information is unrecoverable is unacceptably high. In the 
preferred embodiments a probability greater than of the 
order of 10~ 10 to 10" 20 may be considered as too high. If 
so, remedial action is taken such as discarding that set 
of storage cells such that the set is not available in use 

15 to store a block of ECC encoded data. On the other hand, 
where the probability is acceptable, then use of the set 
of storage cells may continue. 

Preferably, the method comprises determining, from 
20 accessing the set of storage cells, one or more failed 
symbols in a block of ECC encoded data that would have 
been affected by a physical failure. Then, suitably, a 
determination is made whether there are more failed 
symbols in the block of ECC encoded data than could be 
25 reliably corrected by error correction decoding the block 
of ECC encoded data. Here, a situation is identified 
where, due to physical failures, ECC decoding the block of 
ECC encoded data would probably fail to correctly recover 
original information. In other words, there is a high 
30 probability (i.e. close to 1) that decoding the block of 
ECC encoded data would not correctly recover original 
inf orma t ion . 
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The preferred test method comprises two aspects, which 
can be applied either alone or preferably in combination. 
The first aspect is parametric-based evaluation of the 
storage cells of the MRAM device, whilst the second aspect 
5 is a logic-based evaluation of the storage cells. These 
two aspects are each particularly useful in determining 
different types of physical failures which have been found 
to affect MRAM devices. 

10 In the first aspect concerning parametric-based 

evaluation, the step of accessing the set of storage cells 
preferably comprises the steps of obtaining parametric 
values from the accessed set of storage cells and 
comparing the obtained parametric values against one or 

15 more ranges. For almost all storage cells in the MRAM 
device, such comparison indicates that, in use, a logical 
bit value could be successfully derived from that storage 
cell. However, due to inevitable manufacturing 

imperfections and other causes, a small proportion of the 

20 storage cells in the MRAM device are expected to be 
affected by physical failures. Conveniently, it has been 
found that storage cells can be identified as being 
affected by at least some types of physical failure, by 
evaluating the obtained parametric values. Preferably, a 

25 failed cell is identified where an obtained parametric 
value falls into a predetermined failure range. In the 
preferred embodiment, the obtained parametric value 
represents resistance, and the predetermined failure range 
represents an abnormally low resistance or an abnormally 

30 high resistance, which indicates cells affected by 
physical failures known as shorted bits and open bits, 
respectively . 
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The second aspect employs a logic -based evaluation of 
the accessed set of storage cells. Here, the step of 
accessing the set of storage cells preferably comprises 
the steps of writing test data to the set of storage 
5 cells, reading the test data from the set of storage 
cells, and comparing the written test data against the 
read test data. It has been found that this 
write-read-compare operation advantageously allows storage 
cells to be identified as being affected by certain types 
10 of physical failure. In the preferred embodiment of the 
present invention, the logic-based evaluation is 
particularly useful in determining physical failures known 
as half -select bits and single failed bits. 

15 The determining step of the preferred method 

preferably comprises determining a failure count, based on 
the identified failed cells. That is, a failure count is 
determined based on the failed cells identified by either 
the parametric -based evaluation or the logic-based 

20 evaluation, and preferably a combination of both. In one 
example, the failure count can simply represent the number 
of identified failed cells within the accessed set of 
storage cells. Preferably, the failure count is based on 
failed symbols of a block of ECC encoded data that, in 

25 use, would be affected by the identified failed cells. 
Here, the method suitably comprises determining the 
position of the identified failed cells within the array 
of storage cells of the MRAM device, and from this 
determining the one or more symbols of ECC encoded data 

30 which, in use, would be affected by failed storage cells 
in those positions. 



PDNO (3001 1265 "B") - Final Draft - 12 September 2001 Appleyard Lees - V88 



6 

The determining step preferably further comprises the 
step of comparing the failure count against a threshold 
value. As one option, the threshold value represents, for 
the accessed set of storage cells, the maximum number of 
5 failed cells which can be tolerated in use by a block of 
ECC encoded data stored in those storage cells. Here, the 
threshold value conveniently represents the situation 
where there is an unacceptably high probability that 
original information would not be correctly recovered. 

10 Preferably, the threshold value represents the total 
number of failed symbols which can be reliably corrected 
by ECC decoding a block of ECC encoded data to be stored 
in the accessed set of storage cells. As a second option, 
the threshold value represents a safety margin less than 

15 the total number of failed symbols correctable in use by 
ECC decoding, such as between about 50% to 95% of the 
total number. In this situation the threshold value is 
particularly useful in that not all physical failures in 
MRAM devices can be readily identified by testing, and the 

20 threshold value is set such that, given the identified 
number of failures, it would still be reasonable to 
perform ECC decoding in use, whilst allowing for an 
additional number of as yet unidentified failures to 
affect the block of ECC encoded data to be stored in the 

25 accessed set of storage cells. Additionally or 

alternatively, the threshold value is useful in that new 
systematic failures may arise as the device ages, and in 
use the device may be susceptible to random failures. 

30 Conveniently, in use original information is received 

for storing in the MRAM device in units of a sector, such 
as 512 bytes. The original information sector is error 
correction encoded to form one or more blocks of ECC 

PDNO (3001 1265 "B") - Final Draft - 12 September 2001 Appleyard Lees - V88 



encoded data. In the preferred embodiment, a linear ECC 
scheme such as a Reed-Solomon code is employed. 
Conveniently, each sector of original information is 
encoded to form a sector of ECC encoded data comprising 
5 four codewords. Each codeword suitably forms the block of 
ECC encoded data mentioned above. 

According to a second aspect of the present invention 
there is provided a method for controlling a 

10 magnetoresistive solid-state storage device, comprising 
the steps of: accessing a set of magnetoresistive storage 
cells, the set being arranged in use to store at least one 
block of ECC encoded data; comparing parametric values 
obtained by accessing the set of storage cells against one 

15 or more ranges; identifying failed cells amongst the 
accessed set of storage cells; forming a failure count 
based on the identified failed cells; comparing the 
failure count against a threshold value; and determining 
whether the accessed set of storage cells is suitable for, 

20 in use, storing at least one block of ECC encoded data. 

According to a third aspect of the present invention 
there is provided a method for controlling a 
magneto-resistive solid-state storage device, comprising 

25 the steps of: accessing a set of magneto -resistive 
storage cells, the set being arranged in use to store at 
least one block of ECC encoded data; writing test data 
to the accessed set of storage cells; reading test data 
from the accessed set of storage cells; comparing the 

30 written test data against the read test data, to identify 
failed cells amongst the accessed set of storage cells; 
forming a failure count based on the identified failed 
cells; comparing the failure count against a threshold 
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value; and determining whether the accessed set of 
storage cells is suitable for, in use, storing at least 
one block of ECC encoded data. 

5 According to a fourth aspect of the present invention 

there is provided a method for controlling a 
magnetoresistive solid-state storage device, comprising 
the steps of: accessing a set of magnetoresistive storage 
cells, the set being arranged in use to store at least one 

10 block of ECC encoded data; comparing parametric values 
obtained by accessing the set of storage cells against one 
or more ranges and thereby identifying failed cells 
amongst the accessed set of storage cells,- performing 
write- read- compare on test data in the accessed set of 

15 storage cells, to thereby identify failed cells amongst 
the accessed set of storage cells; forming a failure count 
based on the identified failed cells; comparing the 
failure count against a threshold value; and determining 
whether the accessed set of storage cells is suitable for, 

20 in use, storing at least one block of ECC encoded data. 

According to a fifth aspect of the present invention 
there is provided a magnetoresistive solid-state storage 
device, comprising: at least one array of magnetoresistive 

25 storage cells; an ECC encoding unit for, in use, forming a 
block of ECC encoded data from a unit of original 
information; a controller arranged to store the block of 
ECC encoded data in a set of the storage cells; and a test 
unit arranged to access the set of storage cells, and 

30 determine whether the accessed set of storage cells is 
suitable for, in use, storing the block of ECC encoded 
data . 
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For a better understanding of the invention, and to 
show how embodiments of the same may be carried into 
effect, reference will now be made, by way of example, to 
the accompanying diagrammatic drawings in which: 

5 

Figure 1 is a schematic diagram showing a preferred 
MRAM device including an array of storage cells ; 

Figure 2 shows a preferred logical data structure; 

10 

Figure 3 shows a preferred method for testing an MRAM 
device, using parametric evaluation; 

Figure 4 is a graph illustrating a parametric value 
15 obtained from a storage cell of an MRAM device; 

Figure 5 shows a preferred method for testing an MRAM 
device, using logic -based evaluation; and 

20 Figure 6 shows a preferred method for testing an MRAM 

device using a combination of both parametric evaluation 
and logic -based evaluation. 

To assist a complete understanding of the present 
25 invention, an example MRAM device will first be described 
with reference to Figure 1, including a description of the 
failure mechanisms found in MRAM devices. The preferred 
methods for testing such MRAM devices will then be 
described with reference to Figures 2 to 6 . 

30 

Figure 1 shows a simplified magnetoresistive solid- 
state storage device 1 comprising an array 10 of storage 
cells 16. The array 10 is coupled to a controller 20 
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which, amongst other control elements, includes an ECC 
coding and decoding unit 22 and a test unit 24 . The 
controller 20 and the array 10 can be formed on a single 
substrate, or can be arranged separately. If desired, the 
5 test unit 24 is arranged physically separate from the MRAM 
device 1 and they are coupled together when it is desired 
to test the MRAM device . 

In one preferred embodiment, the array 10 comprises of 

10 the order of 1024 by 1024 storage cells, just a few of 
which are illustrated. The cells 16 are each formed at an 
intersection between control lines 12 and 14. In this 
example control lines 12 are arranged in rows, and control 
lines 14 are arranged in columns. One row 12 and one or 

15 more columns 14 are selected to access the required 
storage cell or cells 16 (or conversely one column and 
several rows, depending upon the orientation of the 
array) . Suitably, the row and column lines are coupled to 
control circuits 18, which include a plurality of 

20 read/write control circuits. Depending upon the 

implementation, one read/write control circuit is provided 
per column, or read/write control circuits are multiplexed 
or shared between columns . In this example the control 
lines 12 and 14 are generally orthogonal, but other more 

25 complicated lattice structures are also possible. 

In a read operation of the currently preferred MRAM 
device, a single row line 12 and several column lines 14 
(represented by thicker lines in Figure 1) are activated 
30 in the array 10 by the control circuits 18, and a set of 
data read from those activated cells. This operation is 
termed a slice. The row in this example is 1024 storage 
cells long I and the accessed storage cells 16 are 
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separated by a minimum reading distance m, such as sixty- 
four cells, to minimise cross-cell interference in the 
read process. Hence, each slice provides up to 
1/m = 1024/64 = 16 bits from the accessed array. 

5 

To provide an MRAM device of a desired storage 
capacity, preferably a plurality of independently 
addressable arrays 10 are arranged to form a macro-array. 
Conveniently, a small plurality of arrays (typically four) 

10 are layered to form a stack, and plural stacks are 
arranged together, such as in a 16 x 16 layout. 
Preferably, each macro-array has a 16 x 18 x 4 or 
16 x 20 x 4 layout (expressed as width x height x stack 
layers) . Optionally, the MRAM device comprises more than 

15 one macro-array. In the currently preferred MRAM device 
only one of the four arrays in each stack can be accessed 
at any one time. Hence, a slice from a macro-array reads 
a set of cells from one row of a subset of the plurality 
of arrays 10, the subset preferably being one array within 

20 each stack. 

Each storage cell 16 stores one bit of data suitably 
representing a numerical value and preferably a binary 
value, i.e. one or zero. Suitably, each storage cell 

25 includes two films which assume one of two stable 
magnetisation orientations, known as parallel and anti- 
parallel. The magnetisation orientation affects the 
resistance of the storage cell. When the storage cell 16 
is in the anti-parallel state, the resistance is at its 

3 0 highest, and when the magnetic storage cell is in the 
parallel state, the resistance is at its lowest. 
Suitably, the anti -parallel state defines a zero logic 
state, and the parallel state defines a one logic state, 
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or vice versa. As further background information, 
EP-A- 0 918 334 (Hewlett-Packard) discloses one example of 
a magnetoresistive solid-state storage device which is 
suitable for use in preferred embodiments of the present 
5 invention. 

Although generally reliable, it has been found that 
failures can occur which affect the ability of the device 
to store data reliably in the storage cells 16. Physical 

10 failures within an MRAM device can result from many causes 
including manufacturing imperfections, internal effects 
such as noise in a read process, environmental effects 
such as temperature and surrounding electro-magnetic 
noise, or ageing of the device in use. In general, 

15 failures can be classified as either systematic failures 
or random failures. Systematic failures consistently 
affect a particular storage cell or a particular group of 
storage cells. Random failures occur transiently and are 
not consistently repeatable. Typically, systematic 

20 failures arise as a result of manufacturing imperfections 
and ageing, whilst random failures occur in response to 
internal effects and to external environmental affects. 

Failures are highly undesirable and mean that at least 
25 some storage cells in the device cannot be written to or 
read from reliably. A cell affected by a failure can 
become unreadable, in which case no logical value can be 
read from the cell, or can become unreliable, in which 
case the logical value read from the cell is not 
30 necessarily the same as the value written to the cell 
(e.g. a w l" is written but a "0" is read) . The storage 
capacity and reliability of the device can be severely 
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affected and in the worst case the entire device becomes 
unusable . 



Failure mechanisms take many forms, and the following 
5 examples are amongst those identified: 



1 . Shorted bits - where the resistance of the storage 
cell is much lower than expected. Shorted bits tend 
to affect all storage cells lying in the same row and 
io the same column. 



2. Open bits - where the resistance of the storage cell 
is much higher than expected. Open bit failures can, 
but do not always, affect all storage cells lying in 
15 the same row or column, or both. 



3. Half-select bits - where writing to a storage cell in 
a particular row or column causes another storage cell 
in the same row or column to change state. A cell 
20 which is vulnerable to half select will therefore 

possibly change state in response to a write access to 
any storage cell in the same row or column, resulting 
in unreliable stored data. 



25 4. Single failed bits - where a particular storage cell 
fails (e.g. is stuck always as a "0"), but does not 
affect other storage cells and is not affected by 
activity in other storage cells. 



30 These four example failure mechanisms are each 

systematic, in that the same storage cell or cells are 
consistently affected. Where the failure mechanism affects 
only one cell, this can be termed an isolated failure. 
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Where the failure mechanism affects a group of cells, this 
can be termed a grouped failure. 

Whilst the storage cells of the MRAM device can be 
5 used to store data according to any suitable logical 
layout, data is preferably organised into basic data units 
(e.g. bytes) which in turn are grouped into larger logical 
data units (e.g. sectors) . A physical failure, and in 
particular a grouped failure affecting many cells, can 

10 affect many bytes and possibly many sectors. It has been 
found that keeping information about logical units such as 
bytes affected by physical failures is not efficient, due 
to the quantity of data involved. That is, attempts to 
produce a list of all such logical units rendered unusable 

15 due to at least one physical failure, tend to generate a 
quantity of management data which is too large to handle 
efficiently. Further, depending on how the data is 
organised on the device, a single physical failure can 
potentially affect a large number of logical data units, 

20 such that avoiding use of all bytes, sectors or other 
units affected by a failure substantially reduces the 
storage capacity of the device. For example, a grouped 
failure such as a shorted bit failure in just one storage 
cell affects many other storage cells, which lie in the 

25 same row or the same column. Thus, a single shorted bit 
failure can affect 1023 other cells lying in the same row, 
and 1023 cells lying in the same column - a total of 2027 
affected cells. These 2027 affected cells may form part 
of many bytes, and many sectors, each of which would be 

30 rendered unusable by the single grouped failure. 

Some improvements have been made in manufacturing 
processes and device construction to reduce the number of 
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manufacturing failures and improve device longevity, but 
this usually involves increased manufacturing costs and 
complexity, and reduced device yields. Hence, techniques 
are being developed which respond to failures and avoid 
5 future loss of data. One example technique is the use of 
sparing. A row identified as containing failures is made 
redundant (spared) and replaced by one of a set of unused 
additional spare rows, and similarly for columns. 
However, either a physical replacement is required (i.e. 

10 routing connections from the failed row or column to 
instead reach the spare row or column) , or else additional 
control overhead is required to map logical addresses to 
physical row and column lines. Only a limited sparing 
capacity can be provided, since enlarging the device to 

15 include spare rows and columns reduces device density for 
a fixed area of substrate and increases manufacturing 
complexity. Therefore, where failures are relatively 
common, sparing is unable to cope leading to possible loss 
of data. Also, sparing is not useful in handling random 

20 failures, and involves additional management overhead to 
determine deployment of sparing capacity. 

The MRAM devices of the preferred embodiments of the 
present invention in use employ error correction coding to 

25 provide a device which is error tolerant, preferably to 
tolerate and recover from both random failures and 
systematic failures . Typically, error correction coding 
involves receiving original information which it is 
desired to store and forming encoded data which allows 

30 errors to be identified and ideally corrected. The 
encoded data is stored in the solid-state storage device. 
At read time, the original information is recovered by 
error correction decoding the encoded stored data. A wide 
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range of error correction coding (ECC) schemes are 
available and can be employed alone or in combination. 
Suitable ECC schemes include both schemes with single-bit 
symbols (e.g. BCH) and schemes with multiple-bit symbols 
5 (e.g. Reed-Solomon). 

As general background information concerning error 
correction coding, reference is made to the following 
publication: W.W. Peterson and E.J. Weldon, Jr. , 
10 "Error -Correcting Codes", 2 nd edition, 12 th printing, 1994, 
MIT Press, Cambridge MA. 

A more specific reference concerning Reed- Solomon 
codes used in the preferred embodiments of the present 
15 invention is: "Reed- Solomon Codes and their Applications", 
ED. S.B. Wicker and V.K. Bhargava, IEEE Press, New York, 
1994 . 

Figure 2 shows an example logical data structure used 
20 when storing active data in the MRAM device 10. Original 
information 200 is received in predetermined units such as 
a sector comprising 512 bytes. Error correction coding is 
performed to produce a block of encoded data 202, in this 
case an encoded sector. The encoded sector 2 02 comprises a 
25 plurality of symbols 206 which can be a single bit (e.g. a 
BCH code with single-bit symbols) or can comprise multiple 
bits (e.g. a Reed-Solomon code using multi-bit symbols) . 
In the preferred Reed-Solomon encoding scheme, each symbol 
206 conveniently comprises eight bits. As shown in Figure 
30 2, the encoded sector 202 comprises four codewords 204, 
each comprising of the order of 144 to 160 symbols. The 
eight bits corresponding to each symbol are conveniently 
stored in eight storage cells 16. A physical failure 
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which affects any of these eight storage cells can result 
in one or more of the bits being unreliable (i.e. the 
wrong value is read) or unreadable (i.e. no value can be 
obtained), giving a failed symbol. 

5 

Error correction decoding the encoded data 202 allows 
failed symbols 206 to be identified and corrected. The 
preferred Reed-Solomon scheme is an example of a linear 
error correcting code, which mathematically identifies and 

10 corrects completely up to a predetermined maximum number 
of failed symbols 206, depending upon the power of the 
code. For example, a [160,128,33] Reed-Solomon code 
producing codewords having one hundred and sixty 8 -bit 
symbols corresponding to one hundred and twenty-eight 

15 original information bytes and a minimum distance of 
thirty-three symbols can locate and correct up to sixteen 
symbol errors. Suitably, the ECC scheme employed is 
selected with a power sufficient to recover original 
information 200 from the encoded data 202 in substantially 

20 all cases. Very rarely, a block of encoded data 202 is 
encountered which is affected by so many failures that the 
original information 200 is unrecoverable. Also, even 
more very rarely the failures result in a mis-correct, 
where information recovered from the encoded data 202 is 

25 not equivalent to the original information 200. Even 
though the recovered information does not correspond to 
the original information, a mis-correct is not readily 
determined. 

30 In the current MRAM devices, grouped failures tend to 

affect a large group of storage cells, lying in the same 
row or column. This provides an environment which is 
unlike prior storage devices. The preferred embodiments 
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of the present invention employ an ECC scheme with multi- 
bit symbols. Where manufacturing processes and device 
design change over time, it may become more appropriate to 
organise storage locations expecting bit-based errors and 
5 then apply an ECC scheme using single-bit symbols, and at 
least some of the following embodiments can be applied to 
single-bit symbols. 

Figure 3 shows a preferred method for testing the MRAM 
io device 1, using parametric evaluation. 

In step 301 a set of storage cells are accessed, 
preferably in a set of read operations. The accessed set 
of storage cells correspond to a set of cells which, in 

15 use, would be used to store a block of ECC encoded data 
such as an encoded sector 202 or a codeword 204. The 
accessed set of storage cells represents a sufficient 
number of storage cells for the following steps to be 
performed, and any suitable set of storage cells can be 

20 accessed. In the currently preferred embodiments, it is 
convenient for the accessed set of storage cells to 
represent a single codeword, or an integer number of 
codewords . In the preferred ECC coding scheme each 
codeword 204 is decoded in isolation, and the results from 

25 ECC decoding plural codewords (in this case four 
codewords) provides ECC decoded data corresponding to an 
original information sector 20 0. 

Step 302 comprises obtaining a plurality of parametric 
30 values associated with the accessed set of storage cells. 
Suitably, a read voltage is applied along the row and 
column control lines 12, 14 causing a sense current to 
flow through selected storage cells 16, which have a 
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resistance determined by parallel or anti -parallel 
alignment of the two magnetic films. The resistance of a 
particular cell is determined according to a phenomenon 
known as spin tunnelling and the cells are often referred 
5 to as magnetic tunnel junction storage cells. The 
condition of the storage cell is determined by measuring 
the sense current (proportional to resistance) or a 
related parameter such as response time to discharge a 
known capacitance. 

10 

Step 303 comprises comparing the obtained parametric 
values to one or more predicted ranges. The comparison of 
step 303, in almost all cases, allows a logical value 
(e.g. one or zero) to be established for each cell. 

15 However, the comparison also conveniently allows storage 
cells affected by at least some forms of physical failure 
to be identified. For example, it has been determined 
that a shorted bit failure leads to a very low resistance 
value in all cells of a particular row and a particular 

20 column. Also, open-bit failures can cause a very high 
resistance value for all cells of a particular row and 
column. By comparing the obtained parametric values 
against predicted ranges, cells affected by failures such 
as shorted-bit and open-bit failures can be identified 

25 with a high degree of certainty. 

Figure 4 is a graph as an illustrative example of the 
probability (p) that a particular cell will have a certain 
parametric value, in this case resistance (r) , 
30 corresponding to a logical "0" in the left-hand curve, or 
a logical "1" in the right-hand curve. As an arbitrary 
scale, probability has been given between 0 and 1, whilst 
resistance is plotted between 0 and 100%. The resistance 
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scale has been divided into five ranges. In range 401, 
the resistance value is very low and the predicted range 
represents a shorted-bit failure with a reasonable degree 
of certainty. Range 4 02 represents a low resistance value 
5 within expected boundaries, which in this example is 
determined as equivalent to a logical "0" . Range 403 
represents a medium resistance value where a logical value 
cannot be ascertained with any degree of certainty. Range 
404 is a high resistance range representing a logical "1" . 

10 Range 405 is a very high resistance value where an open- 
bit failure can be predicted with a high degree of 
certainty. The ranges shown in Figure 4 are purely for 
illustration, and many other possibilities are available 
depending upon the physical construction of the MRAM 

15 device 1, the manner in which the storage cells are 
accessed, and the parametric values obtained. The range or 
ranges are suitably calibrated depending, for example, on 
environmental factors such as temperature, factors 
affecting a particular cell or cells and their position 

20 within the array, or the nature of the cells themselves 
and the type of access employed. 

Referring again to Figure 3, step 3 04 comprises 
counting a number of physical failures, preferably on the 
25 basis of failed cells identified in the comparison of step 
303. Suitably, the count of parametric failures in step 
304 is performed on the basis of the number of symbols 206 
(each containing one or more bits) which would, in use, be 
affected by the identified failed cells. 

30 

Step 305 comprises comparing the number of parametric 
failures, i.e. the number of failed symbols identified by 
parametric testing, against a predetermined threshold 
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value. The number of physical failures can be represented 
in any suitable form. Depending upon the nature of the 
ECC scheme employed, some types of failure can be weighted 
differently to other types of failure. Since, in use, the 
5 data to be stored in the storage cells represents encoded 
data, it is expected that ECC decoding will not be able 
reliably to correctly recover the original data, where the 
number of parametric failures is greater than the maximum 
power of the ECC scheme. Hence, the threshold value is 

10 suitably selected to represent a value which is equal to 
or less than the maximum number of failures which the ECC 
scheme employed is able to correct. Preferably, the 
threshold value in step 305 is selected to be 
substantially less than the maximum power of the ECC 

15 decoding scheme, suitably of the order of 50% to 95% of 
the maximum power. In a particular preferred embodiment 
the threshold value in step 305 is selected to represent 
about 50% to 75% and suitably about 60% of the maximum 
power of the employed ECC scheme. Preferably, the step 

20 305 comprises determining the number of parametric 
failures to be greater than the threshold value, such 
that, in use, performing ECC decoding is expected (with a 
sufficiently high probability) not to be able to correctly 
recover information from the encoded data. That is, where 

25 the number of parametric failures is greater than the 
threshold value, there is a greater than acceptable 
probability that information is unrecoverable from the 
encoded data, or that a miscorrect will occur. 

30 Step 306 comprises determining whether or not to 

continue use of the set of cells corresponding to the 

accessed block of data, in view of the number of 

parametric failures which have been identified. If 
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desired, remedial action can be taken. Such remedial 
action may take any suitable form, to manage future 
activity in the storage cells 16. As one example, the set 
of storage cells 16 corresponding to a codeword 204 or to 
5 a complete encoded sector 2 02 are identified and 
discarded, in order to avoid possible loss of data in 
future. In the currently preferred embodiments it is most 
convenient to use or discard sets of storage cells 
corresponding to an encoded sector 202, although greater 
10 or lesser granularity can be applied as desired. In the 
preferred embodiment, each sector comprises four 
codewords, and a sector is made redundant where any one of 
its four codewords contains a number of failures which is 
greater than the threshold value of step 305. 

15 

The test method of Figure 3 is particularly useful as 
a test procedure immediately following manufacture of the 
device, or at installation, or at power up, or at any 
convenient time subsequently. In one example, the test 

20 procedure of Figure 3 is performed by writing a test set 
of data to the device and then reading from the device, or 
by any other suitable parametric testing. In particular, 
it is useful to apply the method of Figure 3 to identify 
areas of the MRAM device which are severely affected by 

25 systematic errors caused by manufacturing imperfections, 
and remedial action can then be taken before the device is 
put into active use storing variable user data. 

The parametric evaluation of Figure 3 is particularly 
30 useful in determining shorted-bit and/or open-bit failures 
in MRAM devices. A systematic failure, such as a half 
select or some forms of isolated bit failure, is not so 
easily detectable using parametric tests. Even so, by 
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selecting an appropriate threshold value, the test method 
is able to provide a practical device which is able to 
take advantage of the considerable benefits offered by the 
new MRAM technology whilst minimising the limitations of 
5 current available manufacturing techniques . 

A second preferred test method will now be described 
with reference to Figure 5, using logic-based evaluation. 

10 In step 501, test data is written to a selected set of 

storage cells 16. This set suitably represents the same 
set as used for parametric evaluation in the method of 
Figure 3 . The test data may take any suitable form, 
according to any suitable logical structure. For example, 

15 the test data may, or may not, include ECC encoded data. 

In step 502, the test data is read from the set of 
storage cells . 

20 In step 503, the written test data and the read test 

data are compared to identify suspected failed cells. If 
desired, steps 501 and 502 can be repeated one or more 
times, to increase confidence that failed cells have been 
correctly identified. Many different types of failures 

25 can be identified. By selecting appropriate test data, 
failed cells affected by shorted-bit and/or open-bit 
failures can be identified, but the method is particularly 
useful in identifying cells affected by half-select 
failures or single-bit failures. 

30 

Step 504 comprises forming a count of 
logically-identified failures. Similar to step 304, this 
count is suitably performed on the basis of the number of 
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symbols 206 (each containing one or more bits) which 
would, in use, be affected by the identified failed cells. 

Step 505 comprises comparing the failure count against 
5 a predetermined threshold value. This comparison is 
preferably similar to the comparison performed in the 
parametric evaluation. The threshold value is suitably 
selected to represent a value which is equal to or less 
than the maximum number of failures which the ECC scheme 
10 to be employed in use is able to correct . In one 
embodiment, the threshold value is selected to be of the 
order of 50% to 95% of this maximum power. 

Step 506 comprises determining whether or not to 
15 continue use of the set of cells, in view of the failure 
count based on logically identified failed cells. 
Remedial action can be taken if desired, as discussed for 
step 306. 

20 Figure 6 shows a preferred test method combining both 

parametric evaluation and logical evaluation. 

Step 601 comprises accessing a set of storage cells. 
In step 602, failed cells are identified with parametric 

25 evaluation as discussed above in the method of Figure 3 . 
In step 603, failed cells, ideally with different types of 
physical failures, are identified with logical evaluation 
as discussed in Figure 5. The logical failures and 
parametric failures are counted in step 604, and this 

30 failure count compared against a threshold in step 605. 
At step 606, a decision is made whether to continue with 
active use of the accessed storage cells. Ideally, 
logical evaluation and parametric evaluation are combined 
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in order identify failed cells from a greater range of 
physical failures than is possible with either method 
alone . 

The MRAM device described herein is ideally suited for 
use in place of any prior solid-state storage device. In 
particular, the MRAM device is ideally suited both for use 
as a short-term storage device (e.g. cache memory) or a 
longer-term storage device (e.g. a solid-state hard disk) . 
An MRAM device can be employed for both short term storage 
and longer term storage within a single apparatus, such as 
a computing platform. 

A magnetoresistive solid-state storage device and a 
method for testing such a device have been described. 
Advantageously, the storage device is able to tolerate a 
relatively large number of errors, including both 
systematic failures and transient failures, whilst 
successfully remaining in operation with no loss of 
original data. Simpler and lower cost manufacturing 
techniques are employed and/or device yield and device 
density are increased. As manufacturing processes improve, 
overhead of the employed ECC scheme can be reduced. 
However, error correction coding and decoding allows 
blocks of data, e.g. sectors or codewords, to remain in 
use, where otherwise the whole block must be discarded if 
only one failure occurs. Therefore, the preferred 
embodiments of the present invention avoid large scale 
discarding of logical blocks and reduce or even eliminate 
completely the need for inefficient control methods such 
as large-scale data mapping management or physical 
sparing . 
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